Active learning: a step towards automating medical concept extraction
نویسندگان
چکیده
OBJECTIVE This paper presents an automatic, active learning-based system for the extraction of medical concepts from clinical free-text reports. Specifically, (1) the contribution of active learning in reducing the annotation effort and (2) the robustness of incremental active learning framework across different selection criteria and data sets are determined. MATERIALS AND METHODS The comparative performance of an active learning framework and a fully supervised approach were investigated to study how active learning reduces the annotation effort while achieving the same effectiveness as a supervised approach. Conditional random fields as the supervised method, and least confidence and information density as 2 selection criteria for active learning framework were used. The effect of incremental learning vs standard learning on the robustness of the models within the active learning framework with different selection criteria was also investigated. The following 2 clinical data sets were used for evaluation: the Informatics for Integrating Biology and the Bedside/Veteran Affairs (i2b2/VA) 2010 natural language processing challenge and the Shared Annotated Resources/Conference and Labs of the Evaluation Forum (ShARe/CLEF) 2013 eHealth Evaluation Lab. RESULTS The annotation effort saved by active learning to achieve the same effectiveness as supervised learning is up to 77%, 57%, and 46% of the total number of sequences, tokens, and concepts, respectively. Compared with the random sampling baseline, the saving is at least doubled. CONCLUSION Incremental active learning is a promising approach for building effective and robust medical concept extraction models while significantly reducing the burden of manual annotation.
منابع مشابه
Towards 'Interactive' Active Learning in Multi-view Feature Sets for Information Extraction
Research in multi-view active learning has typically focused on algorithms for selecting the next example to label. This is often at the cost of lengthy wait-times for the user between each query iteration. We deal with a real-world information extraction task, extracting attribute-value pairs from product descriptions, where the learning system needs to be interactive and the users time needs ...
متن کاملQuickUMLS: a fast, unsupervised approach for medical concept extraction
Entity extraction is a fundamental step in many health informatics systems. In recent years, tools such as MetaMap and cTAKES have been widely used for medical concept extraction on medical literature and clinical notes; however, relatively little interest has been placed on their scalability to large datasets. In this work, we present QuickUMLS: a fast, unsupervised, approximate dictionary mat...
متن کاملThe Relationship between Social Anxiety and Self-concept among Fifth Grade Female Students in Jahrom /Iran
Background & Aims: children and adolescents can use their maximum of mental capacity and potential capabilities, if they benefit from a positive attitude towards their surrounding environment and a strong incentive for being active in the community. The aim of this study was to explore the relationship of social anxiety including social phobia, social interaction and maladaptive behavior with s...
متن کاملTowards Representation Learning for Biomedical Concept Detection in Medical Images: UA.PT Bioinformatics in ImageCLEF 2017
Representation learning is a field that has rapidly evolved during the last decade, with much of this progress being driven by the latest breakthroughs in deep learning. Digital medical imaging is a particularly interesting application since representation learning may enable better medical decision support systems. ImageCLEFcaption focuses on automatic information extraction from biomedical im...
متن کاملTowards Automatic Establishment of Model Dependencies Using Formal Concept Analysis
Software evolution is an iterative and incremental process that encompasses the modification and alteration of software models at different levels of abstraction. These modifications are usually performed independently, but the objects to which they are applied to, are in most cases mutually dependent. Inconsistencies and drift among related artifacts may be created if the effects of an alterat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of the American Medical Informatics Association : JAMIA
دوره 23 2 شماره
صفحات -
تاریخ انتشار 2016